Unbiased split selection for classification trees based on the Gini Index
نویسندگان
چکیده
منابع مشابه
Unbiased split selection for classification trees based on the Gini Index
The Gini gain is one of the most common variable selection criteria in machine learning. We derive the exact distribution of the maximally selected Gini gain in the context of binary classification using continuous predictors by means of a combinatorial approach. This distribution provides a formal support for variable selection bias in favor of variables with a high amount of missing values wh...
متن کاملSplit Selection Methods for Classification Trees
Classification trees based on exhaustive search algorithms tend to be biased towards selecting variables that afford more splits. As a result, such trees should be interpreted with caution. This article presents an algorithm called QUEST that has negligible bias. Its split selection strategy shares similarities with the FACT method, but it yields binary splits and the final tree can be selected...
متن کاملAn (almost) unbiased estimator for the S-Gini index∗
This note provides an unbiased estimator for the absolute S-Gini and an almost unbiased estimator for the relative S-Gini for integer parameter values. Simulations indicate that these estimators perform considerably better then the usual estimators, especially for small sample sizes. 1 The absolute and relative S-gini indices Assume that income is distributed according to a continuous and diffe...
متن کاملStatistical Sources of Variable Selection Bias in Classification Tree Algorithms Based on the Gini Index
Evidence for variable selection bias in classification tree algorithms based on the Gini Index is reviewed from the literature and embedded into a broader explanatory scheme: Variable selection bias in classification tree algorithms based on the Gini Index can be caused not only by the statistical effect of multiple comparisons, but also by an increasing estimation bias and variance of the spli...
متن کاملAn Improved Text Classification Method Based on Gini Index
In text classification, the purity of the Gini index can be used. When purity value is greater, the characteristic of the information contained in the attribute is higher, and the feature distinguishing capability is stronger. But using the Gini purity formula on feature weight, the classification result is not very good, one of the main reasons is those rare words only appearing in one categor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computational Statistics & Data Analysis
سال: 2007
ISSN: 0167-9473
DOI: 10.1016/j.csda.2006.12.030